Search CORE

18 research outputs found

Survey on Causal-based Machine Learning Fairness Notions

Author: Makhlouf Karima
Palamidessi Catuscia
Zhioua Sami
Publication venue
Publication date: 18/01/2021
Field of study

Addressing the problem of fairness is crucial to safely use machine learning algorithms to support decisions with a critical impact on people's lives such as job hiring, child maltreatment, disease diagnosis, loan granting, etc. Several notions of fairness have been defined and examined in the past decade, such as, statistical parity and equalized odds. The most recent fairness notions, however, are causal-based and reflect the now widely accepted idea that using causality is necessary to appropriately address the problem of fairness. This paper examines an exhaustive list of causal-based fairness notions, in particular their applicability in real-world scenarios. As the majority of causal-based fairness notions are defined in terms of non-observable quantities (e.g. interventions and counterfactuals), their applicability depends heavily on the identifiability of those quantities from observational data. In this paper, we compile the most relevant identifiability criteria for the problem of fairness from the extensive literature on identifiability theory. These criteria are then used to decide about the applicability of causal-based fairness notions in concrete discrimination scenarios

arXiv.org e-Print Archive

Machine learning fairness notions: Bridging the gap with real-world applications

Author: Makhlouf Karima
Palamidessi Catuscia
Zhioua Sami
Publication venue: 'Elsevier BV'
Publication date: 31/12/2020
Field of study

Fairness emerged as an important requirement to guarantee that Machine Learning (ML) predictive systems do not discriminate against specific individuals or entire sub-populations, in particular, minorities. Given the inherent subjectivity of viewing the concept of fairness, several notions of fairness have been introduced in the literature. This paper is a survey that illustrates the subtleties between fairness notions through a large number of examples and scenarios. In addition, unlike other surveys in the literature, it addresses the question of: which notion of fairness is most suited to a given real-world scenario and why? Our attempt to answer this question consists in (1) identifying the set of fairness-related characteristics of the real-world scenario at hand, (2) analyzing the behavior of each fairness notion, and then (3) fitting these two elements to recommend the most suitable fairness notion in every specific setup. The results are summarized in a decision diagram that can be used by practitioners and policymakers to navigate the relatively large catalog of ML

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Polytechnique

Survey on Causal-based Machine Learning Fairness Notions

Author: Makhlouf Karima
Palamidessi Catuscia
Zhioua Sami
Publication venue: HAL CCSD
Publication date: 31/12/2020
Field of study

INRIA a CCSD electronic archive server

HAL-Polytechnique

Identifiability of Causal-based Fairness Notions: A State of the Art

Author: Makhlouf Karima
Palamidessi Catuscia
Zhioua Sami
Publication venue
Publication date: 07/06/2022
Field of study

Machine learning algorithms can produce biased outcome/prediction, typically, against minorities and under-represented sub-populations. Therefore, fairness is emerging as an important requirement for the large scale application of machine learning based technologies. The most commonly used fairness notions (e.g. statistical parity, equalized odds, predictive parity, etc.) are observational and rely on mere correlation between variables. These notions fail to identify bias in case of statistical anomalies such as Simpson's or Berkson's paradoxes. Causality-based fairness notions (e.g. counterfactual fairness, no-proxy discrimination, etc.) are immune to such anomalies and hence more reliable to assess fairness. The problem of causality-based fairness notions, however, is that they are defined in terms of quantities (e.g. causal, counterfactual, and path-specific effects) that are not always measurable. This is known as the identifiability problem and is the topic of a large body of work in the causal inference literature. This paper is a compilation of the major identifiability results which are of particular relevance for machine learning fairness. The results are illustrated using a large number of examples and causal graphs. The paper would be of particular interest to fairness researchers, practitioners, and policy makers who are considering the use of causality-based fairness notions as it summarizes and illustrates the major identifiability resultsComment: arXiv admin note: text overlap with arXiv:2010.0955

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Polytechnique

Identifiability of Causal-based Fairness Notions: A State of the Art

Author: Makhlouf Karima
Palamidessi Catuscia
Zhioua Sami
Publication venue: HAL CCSD
Publication date: 03/01/2023
Field of study

INRIA a CCSD electronic archive server

Identifiability of Causal-based ML Fairness Notions

Author: Makhlouf Karima
Palamidessi Catuscia
Zhioua Sami
Publication venue: HAL CCSD
Publication date: 04/12/2022
Field of study

International audienceMachine learning algorithms can produce biased outcome/prediction, typically, against minorities and under-represented sub-populations. Therefore, fairness is emerging as an important requirement for the large scale application of machine learning based technologies. The most commonly used fairness notions (e.g. statistical parity, equalized odds, predictive parity, etc.) are observational and rely on mere correlation between variables. These notions fail to identify bias in case of statistical anomalies such as Simpson's or Berkson's paradoxes. Causality-based fairness notions (e.g. counterfactual fairness, no-proxy discrimination, etc.) are immune to such anomalies and hence more reliable to assess fairness. The problem of causality-based fairness notions, however, is that they are defined in terms of quantities (e.g. causal, counterfactual, and path-specific effects) that are not always measurable. This is known as the identifiability problem and is the topic of a large body of work in the causal inference literature. The first contribution of this paper is a compilation of the major identifiability results which are of particular relevance for machine learning fairness. To the best of our knowledge, no previous work in the field of ML fairness or causal inference provides such systemization of knowledge. The second contribution is more general and addresses the main problem of using causality in machine learning, that is, how to extract causal knowledge from observational data in real scenarios. This paper shows how this can be achieved using identifiability

HAL-Polytechnique

Finding a Needle in a Haystack: The Traffic Analysis Version

Author: Makhlouf Karima
Qasem Abdullah
Zhioua Sami
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/04/2019
Field of study

Traffic analysis is the process of extracting useful/sensitive information from observed network traffic. Typical use cases include malware detection and website fingerprinting attacks. High accuracy traffic analysis techniques use machine learning algorithms (e.g. SVM, kNN) and require to split the traffic into correctly separated blocks. Inspired by digital forensics techniques, we propose a new network traffic analysis approach based on similarity digest. The approach features several advantages compared to existing techniques, namely, fast signature generation, compact signature representation using Bloom filters, efficient similarity detection between packet traces of arbitrary sizes, and in particular dropping the traffic splitting requirement altogether. Experimental results show very promising results on VPN and malware traffic, but low results on Tor traffic due mainly to the single-size cells feature

Directory of Open Access Journals

(Local) Differential Privacy has NO Disparate Impact on Fairness

Author: Arcolezi Héber Hwang
Makhlouf Karima
Palamidessi Catuscia
Publication venue: Springer Nature Switzerland
Publication date: 19/07/2023
Field of study

Best Paper AwardInternational audienceIn recent years, Local Differential Privacy (LDP), a robust privacy-preserving methodology, has gained widespread adoption in realworld applications. With LDP, users can perturb their data on their devices before sending it out for analysis. However, as the collection of multiple sensitive information becomes more prevalent across various industries, collecting a single sensitive attribute under LDP may not be sufficient. Correlated attributes in the data may still lead to inferences about the sensitive attribute. This paper empirically studies the impact of collecting multiple sensitive attributes under LDP on fairness. We propose a novel privacy budget allocation scheme that considers the varying domain size of sensitive attributes. This generally led to a better privacyutility-fairness trade-off in our experiments than the state-of-art solution. Our results show that LDP leads to slightly improved fairness in learning problems without significantly affecting the performance of the models. We conduct extensive experiments evaluating three benchmark datasets using several group fairness metrics and seven state-of-the-art LDP protocols. Overall, this study challenges the common belief that differential privacy necessarily leads to worsened fairness in machine learning

INRIA a CCSD electronic archive server

HAL-Polytechnique